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Abstract. Approximate Bayesian Computation (ABC) is a useful class of meth¬ 
ods for Bayesian inference when the likelihood function is computationally in¬ 
tractable. In practice, the basic ABC algorithm may be inefficient in the pres¬ 
ence of discrepancy between prior and posterior. Therefore, more elaborate meth¬ 
ods, such as ABC with the Markov chain Monte Carlo algorithm (ABC-MCMC), 
should be used. However, the elaboration of a proposal density for MCMC is a 
sensitive issue and very difficult in the ABC setting, where the likelihood is in¬ 
tractable. We discuss an automatic proposal distribution useful for ABC-MCMC 
algorithms. This proposal is inspired by the theory of quasi-likelihood (QL) func¬ 
tions and is obtained by modelling the distribution of the summary statistics 
as a function of the parameters. Essentially, given a real-valued vector of sum¬ 
mary statistics, we reparametrize the model by means of a regression function 
of the statistics on parameters, obtained by sampling from the original model 
in a pilot-run simulation study. The QL theory is well established for a scalar 
parameter, and it is shown that when the conditional variance of the summary 
statistic is assumed constant, the QL has a closed-form normal density. This idea 
of constructing proposal distributions is extended to non constant variance and 
to real-valued parameter vectors. The method is illustrated by several examples 
and by an application to a real problem in population genetics. 

Keywords: Estimating function, Likelihood-free methods, Markov chain Monte 
Carlo, Proposal distribution, Pseudo-likelihood. 


1 Introduction 

Many statistical applications in diverse fields such as biology, genetics and finance of¬ 
ten involve stochastic models with analytically or computationally intractable likeli¬ 
hood functions. The rapidly growing literature on Approximate Bayesian Computation 
(ABC) has led to a set of methods which do not involve direct calculation of the likeli¬ 
hood, leading to Bayesian inference that is approximate in a sense that will be specified 
later. 

ABC methods are becoming popular in genetics (Siegmund et ah, 2008; Foil et ah, 
2008), epidemiology (Blum and Tran, 2010; Tanaka et ah, 2006) and in population 
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biology (Ratmann et al., 2007; Hamilton et al., 2005; Cornuet et al., 2008) among other 
areas. 

Formally, let y = (jj±, ..., y n ) be a random sample of size n drawn from a statistical 
model 7 r(y | 9) indexed by the parameter 9 £ 0 C IR P . The likelihood for 9, correspond¬ 
ing to n(y | 9) is L^(9), which is not available in closed expression. For a certain prior 
7 r(0), the aim is to obtain the posterior distribution 7 tn{9 \ y) oc Ln(9)tt(9), but as 
Ln(9) is inaccessible, tvn{9 | y) cannot be approximated by directly evaluating L^{9). 

This difficulty may be overcome by using ABC methods. Specifically, let s = s{y) £ 
S C ]R P be a vector of observable summary statistics (e.g. mean, variance, quantiles 
etc.), which may not be sufficient, let p(s,s 0 b s ) be a metric distance between s and its 
observed value s 0 6 S with e > 0 the tolerance parameter. ABC methods approximate 
kn(0 | y) by 



where 7r e (0,s | s 0 t> s ) oc 7r(0)7r(s | 9) I p<£ , and Ip< £ is the indicator function for the 
event {s £ S \ p(s 0 b s ,s) < e}. They require a choice of e and for this purpose several 
authors (Bortot et al., 2007; Faisal et al., 2013; Ratmann et al., 2014; Barnes et al., 
2012; Aeschbacher et al., 2012) suggest approaches where e is estimated as part of an 
extended model with respect to ir(y | 9 ), while a recent approach based on diagnostic 
tools for ABC can be found in Prangle et al. (2013a). In this work another criterion for 
choosing e is discussed. 

The basic version of the ABC algorithm relies on simulation by the mixture repre¬ 
sentation method consisting in generating, say, T values of 9 from 7 t(9) and using them 
to generate the corresponding T values of s from n(y \ 9) at the simulated 9. We accept 
all values of 9 such that p( s, s 0 b s ) < e. For e —>• 0 the ABC method has been proven to 
return a consistent estimator of the posterior 7 t(9 | s 0 b s ) and under some assumptions 
it is also possible to provide the approximation error as shown in Biau et al. (2012). 
Moreover, if s is sufficient and e — > 0, then n e (9 \ s Q b s ) —> npf(9 \ y). There is a certain 
agreement in that low dimensional, but informative summary statistics improve the 
accuracy of the ABC approximation (Blum et al., 2013). 

One drawback of the basic ABC algorithm is that it can be extremely inefficient 
when the discrepancy between n(9) and ttn{9 \ y) is relevant. Unfortunately, as L N (9) 
is intractable, the discrepancy between 7 t{9) and kn{9 | y) is difficult to know a priori 
and not easy to assess. To deal with this issue, more advanced Monte Carlo methods, 
such as ABC-MCMC, originally developed in Marjoram et al. (2003) and further ana¬ 
lyzed in several papers such as Beaumont et al. (2009a); Andrieu and Roberts (2009); 
Lee (2012), or Sequential Monte Carlo (SMC) methods (see, e.g., Beaumont et al., 
2009b; Sisson et al., 2007) may be used. All these methods attempt to account for 
the observed data at the proposal stage. However, to accomplish this task, a proposal 
distribution or a perturbing kernel is required, which in practice is supplied by the 
analyst. 
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Another aspect of the method, which is of major concern in the ABC literature, is 
the choice of s , which should be informative for 9. The same concern applies here, as we 
require s not to be ancillary with respect to 9. Many suggestions can be found in the 
current literature. For instance, Fearnhead and Prangle (2012) propose considering the 
posterior mean of 9 , i.e. E v t^g\ SobB )(9). The latter is estimated by means of a pilot-run 
simulation that depends on the specific observed sample. Moreover, Ruli et al. (2013) 
suggest choosing s as the score of the composite likelihood function obtained from 
■n(y | Q). 

The present work focuses on the study of a class of proposal distributions for 
ABC-MCMC, and a method for building proposal densities, which target the poste¬ 
rior 7 j e {6 | s 0 b s ), is illustrated. Such proposal distributions depend on the model at 
hand and account for the observed data. These distributions for 9 are constructed in 
a way that leads to adopting a normal kernel on the space of s, and then consider a 
reparametrization from s to 9 by a suitable regression function f(9) = E n ^Q)(s \ 9). 
A recent approach, strongly connected with the use of such a regression function, can 
be found in Ratmann et al. (2014), where the f{9) is the binding function in indirect 
inference (Gourieroux et al., 1993). 

We show that for scalar parameter problems and f{9) such proposal distributions 
arise from the class of quasi-likelihood functions (QL) of 9 (McCullagh, 1991) denoted 
by Lq(9). For multidimensional parameter problems, the QL is not tractable, but the 
idea can still be generalized to these contexts using asymptotic arguments. Indeed, for 
the vector of parameters 9 , we consider a multivariate normal kernel and a multivariate 
transformation from s to 9. For both the scalar and the multi parameter cases, these 
transformations are typically not available analytically and we estimate them in a pilot- 
run simulation. This pilot-run simulation is performed regardless of the specific observed 
sample and thus it can serve for routine analysis. This is an appealing feature of the 
proposed method as will be shown later by an application to a Genome Wide Association 
Study (GWAS). 

Despite the fact that under some more elaborate requirements for the proposal 
distribution, later discussed, we end up in a proposal which is not the QL for 9 , we 
think that the connection of ABC and QL is important because Ln(9) is not available 
and the estimation theory of the QLs guarantees, asymptotically, that Lq{9 ) targets 
L n {9). 

The structure of the paper is as follows: Section 2 discusses Lq(9), which inspires 
the proposal distributions considered throughout the paper. These distributions will be 
embedded in the ABC-MCMC algorithm. The proposed ABC algorithm, ABC q ;, for 
scalar parameters is formally discussed in Section 3, while the generalization to p > 1 is 
presented in Section 4. Section 5 illustrates the proposed method with some examples 
from the ABC literature and an important application to GWAS for population genetic 
isolates. Conclusions and further remarks are given in Section 6. 
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2 The two relevant tools: quasi-likelihood and the ABC- 
MCMC algorithm 


The theory and use of estimating equations and that of the related quasi- and quasi¬ 
profile likelihood functions have received a good deal of attention in recent years; see, 
among others, Liang and Zeger (1995); Barndorff-Nielsen (1995); Desmond (1997); 
Heyde (1997); Adimari and Ventura (2002); Severini (2002); Wang and Hanfelt (2003); 
Jprgensen and Knudsen (2004); Bellio et al. (2008). In addition, Ventura et al. (2010); 
Lin (2006); Greco et al. (2008) discuss the use of QL functions in the Bayesian setting. 

Let s = s(y) S K be a scalar summary statistic generated from \ 9) whose 

observed value is s 0 t> s . We assume for convenience that the summary statistic lies on 
the real line, which can be easily achieved by suitable transformations (e.g. a log- 
transformation of the sample variance). 

Moreover, suppose 9 is a scalar parameter, i.e. p = 1 and let T(s; 9) be an unbiased 
estimating function of 9 based on s, i.e. i5 w (.y|e){'I , (S'; 0)} = 0. 

The QL for 9 based on 4/(s; 9) (McCullagh, 1991), is given by 



( 1 ) 


where A(9) = M(9)/VL(9) 1 cq is an arbitrary constant, 



and 


VL{9) = E{^{S; 9) 2 | 9} = Var {tf(S; 9) \ 9}. 


When p = 1, a quasi likelihood for 9 is usually easy to derive, while for p > 1 some 
difficulties arise. Moreover, as shown below, for a suitable estimating function and under 
Var{4'(S l ;0) | 9} constant, (1) is a normal kernel. 

The ABC-MCMC algorithm, proposed in Marjoram et al. (2003), summarized in 
Algorithm 1, evaluates Ln(0) indirectly via the indicator function I p<e , and uses the 
proposal density q(9^ t ' > \ 9 ( ' t ~ 1 ^). 

Depending on how the proposal is defined, with Algorithm 1 we may implement the 
independent Metropolis Hastings (MH) or the Random Walk (RW) MH. SMC methods 
may also be considered as in Toni et al. (2009). Finally, the proposal q(-) can also be 
viewed as an importance function for the implementation of an Importance Sampling 
(IS) simulation algorithm. 

The proof that 7r e (0 | s 0 b s ) is the stationary distribution of Algorithm 1 is contained 
in Theorem 1 of Marjoram et al. (2003) and the rate of convergence depends on the 
choice of q(-) and e. However, as Ln{9) is not tractable, it is not possible to further char¬ 
acterize the stochastic behavior of the induced chain. In fact, the theoretical conditions 
discussed in Mengersen and Tweedie (1996) and Atchade and Perron (2007) may be 
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Algorithm 1 The ABC-MCMC algorithm 

l: Set e > 0, = Bi n it G 0; 

2: for t = 1 to T do 
3: generate 9* ~ |0(* -1 )); 

4: generate s ~ n(s(y) \ 0*); 

5: calculate p = p{s 0 b s , s); 

6: with probability 



accept 9* and set 0W = 0* ; otherwise 0® = 9^~^ 

7: end for 

8: return 0W, ..., 9 (yT ' > 


assessed if Ljv (0) is available in a closed form expression. Notice that the approach can 
also be viewed as a pseudo marginal MH as we are working with an estimated likelihood 
when evaluating the indicator function I p<£ , and results for convergence of such pseudo 
marginal algorithms can also be found in Andrieu and Roberts (2009). 


3 The ABC^ for a scalar parameter 


The main objective of this paper is to construct a proposal density centered on the 
bulk of the posterior distribution. In a setting where the likelihood cannot be computed 
explicitly, this issue could be cumbersome. For this purpose, we consider a QL derived 
from estimating functions based on s. 

The following Proposition 1 provides the expression of Lq{9) for a general statistic 
s and assuming Var^i q^(S \ 9) is constant. 

Proposition 1. Suppose p = 1 and let f{9) = \ 9) be a bounded regression 

function under the sampling model n(y \ 9) for which the Jacobian | f'{9) |< oo and 
that the conditional variance ’VcUv^ie^S' \9) = u 2 R is constant with respect to 9. 

Consider the following estimating function J>(s O 6 S ;0) = s 0 b s — f(9). In this case we 
have 



( 2 ) 


where </>(•) is the density of the standard normal distribution. 


Proof. Note that the estimating function is unbiased because i?{T(s 0 b s ; 0) | 0} = 
E(S | 0) — /(0) = 0. From the definition of Lq{9 ) the following quantities are needed: 
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fi(0) = E{^(s obs ;9) 2 | 9} 


= Var(S-/(0)|0) 
= Var(5 | 9) 


Then A(0) = f'{9)/a R and by (1) we have 


Lq{0) = exp 



J-^r-(s obs - f(t))dt 

& n 


R 


1 

oc — exp 
ctr 


U(d)~s obs ) 2 \ 


2 A ) 


which is the kernel of the normal distribution centered at s obs with variance <t 2 r . □ 

Expression (2) suggests that Lq{9) is a normal density when working in the f(0) 
parametrization, and thus if one is able to make a change-of-variable from f{9) to 9, it 
could be employed as a proposal for MCMC-ABC algorithms, leading to a broad class 
of ABC methods denoted ABC g ; algorithms. 

Note that the constant variance assumption, Var 7r (j / |@)(S' | 0) = cr R , leads to a 
closed-form proposal distribution, a normal density. However, as this assumption may 
be restrictive, we extend the idea of constructing proposal distributions based on Lq(9 ), 
but assuming a non constant variance, <j 2 r (9) 1 which can be estimated as well as f(9) 
(see Subsection 3.1). The theory of QL assures that also for non constant cr R (9) there 
exists a corresponding Lq{9 ) whose form is intractable and for this reason it cannot 
be used directly as a proposal density. Instead, our proposal distribution for a Random 
Walk Metropolis Hastings (RWMH) is based on a distribution of the form of Lq(0) in 
(2) where cr R (9) is non constant: 



(3) 


Finally, this way of constructing proposals could also be extended to other types of 
distributions with heavy tails, such as the f-Student distribution. 

On the other hand, ABC Importance Sampling (ABC-IS) can be implemented using 
q(9) = Lq{9)\/'{9)\ as the importance function from which it is possible to simulate. 
Assuming <jr(9) = an, it would be enough to simulate a sample z from a standard 
normal and then calculate f~ 1 (zan+s obs ) to have a draw of 9. The ABC-IS is completed 
by the evaluation of the importance weights by means of q(9). Also ABC-SMC can be 
used starting with a sample from q(9) and using steps S1-S3 from Toni et al. (2009). 
The computational requirements are almost the same as for the ABC-IS. In fact, for 
ABC-SMC it is necessary to calculate importance weights to be updated in the MC 
sequence. 
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Algorithm 2 The ABC g z for p = 1 

Require: /, f'(8), er R {9), or their estimates (/, f'{9), cr 2 R {8)). 
l: Set e > 0 and 9 = / _1 (s 0 bs); 

2: for t = 1 to T do 
3: generate 

f* ^ N (f(e^),a 2 R (e^)y, 

4: set 9* = {9 :/-!(/*) = 9}; 

5: generate s ~ 7r(s(y) | 0*); 

6: calculate p = p(s 0 b s , s); 

7: calculate the derivative, f'(9), of f(8), at 9^ t ~ 1 ' > and 9*; 

8: with probability 


. r ^(e*)^*- 1 ) I r) ¥ 

mm \ ’ 7r(6»(‘- 1 )) g Q(r I 6K*-!)) p<£ 

accept 0* and set 0^ = 0*, otherwise 0W = 

9: end for 

10: return 0W, ..., 0( T ) 


3.1 Estimation of f(6), f'(0) and cr 2 R (0 ) 

The function /(0) can be elicited, suggested by the model in Mengersen et al. (2013); 
Ratmann et al. (2007) or by theoretical arguments as in Heggland and Frigessi (2004). 
For instance, in the genetic model analyzed in Mengersen et al. (2013), where the 
constraint of the empirical likelihood plays the same role as ^(sobs;#), f(8) is built 
upon the score function of the pairwise likelihood corresponding to the model. However, 
except in few specific situations, f(9), f'(9) and cr R (9) are generally unknown, and we 
replace them in Algorithm 2 by estimates that can be obtained in a pilot-run simulation 
as stated in Algorithm 3. In the sequel, when referring to the ABC g z algorithms, our 
intention is always that f(8), f (9) and cr R (9) are unknown and replaced by an estimator 
with the sole purpose of providing the input as a proposal density for Algorithm 2 
(or Algorithm 5). If f(8), f'(6) and cr 2 R {9) were known, the computational effort for 
the ABC g ; algorithm would be reduced. In any case, the proof of the convergence of 
Algorithm 2 (or Algorithm 5) is discussed in the previous Section 2. 

The functions f(8) or d R (d) can be any estimator which provides smoothing re¬ 
gression functions, and f(9) is at least once differentiable. This implies that the main 
assumption for f(6) is to be monotone and once differentiable. We find it useful to use 
smoothing splines, for which the derivative, f'(9), can be obtained analytically from 
splines coefficients. Other choices are possible and left to the convenience of the analyst 

f n I M 

that inspects the scatter diagram of points and provides a goodness-of- 

fit argument that justifies the choice. The inverse / -1 (/*), at some point /*, can be 
obtained either analytically, with the bisection method on f(8) = f* or by numerical 
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Algorithm 3 Estimation of f(d), f'{9) and cr 2 R {9) for p = 1 

Require: M, 0 

1: consider M values 9 = ( 6 1 ,..., 9m) taken in a regular spaced grid of a suitable large 
subset 0C0; 

2 : generate s = (s i,..., s M ) where s m ~ n(s{y) \ 9 m ); 

3: regress s on 9 obtaining f(0) and /'(#); 

4: regress j log (f(9 m ) - s"m) } on 0 obtaining d 2 R (9). 

5: return f(0), f'(9) and a R (9). 


minimization of (f(0) — /*) 2 , e.g., by a Newton-Raphson algorithm. 

Some observations are appropriate. 

i) Since we are able to simulate from n(y \ 9), then f{9) and a 2 1 (9) can be practically 
estimated with a precision that depends on the available computational resources. 
Also more precision can be achieved by making the regular grid 0 wider or by 
increasing the number of simulations, M. Values of M ranging from 100 to 1000 
are enough in the examples discussed later for p = 1, while larger values are needed 
for p > 1, due to the curse of dimensionality as the computational effort increases 
exponentially with p. 

ii) The range of 0 should always be large enough to include the observed s 0 b s in 
order to gain precision around the bulk of the target posterior n e (9 \ s 0 b s )■ 

Hi) The monotonicity assumption is a necessary condition for ABC as it states that 
there exists a relation between s and 9 through /, see e.g., Ratmann et al. (2007). 
The lack of monotonicity is not a fault of the proposed method, but instead it 
is an indication of the fact that s is not informative for 9 in some subset of the 
parameter space. This would be automatically recognized by the proposal as it 
would be essentially flat in such a region due to a small Jacobian. 

iv) The function Lq(9) can also be useful to fix e, because under the assumption 
that 7r/v(0 | y) is Ln(9) dominated and that Lq{9) was its approximation, then it 
would be enough to simulate 9 from Lq{9) and s from S(y)\9 obtaining thus the 
distribution for p(s 0 b s ,s) and fixing e as its suitable quantile. 

v) The computational cost for approximating ttn(9 \ y) with the proposed approach 
should include that for estimating f{9), approximating the inverse / _1 (/*) and 
its Jacobian /', where the latter is mainly important for p > 1, as for p = 1 
the derivative is obtained analytically. Such costs in the implemented examples 
are actually reduced by another interpolation of the inverse and Jacobian with 
splines, or its corresponding equivalent Generalized Additive Model (GAM) for 
p > 1 (see the next section). This interpolation speeds up the MCMC because 
at each step we do not need to calculate its inverse and Jacobian, but just its 
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interpolation. Finally, because the most important cost is model simulations rather 
than regression estimation, we note that, in the analyzed examples, the number 
of simulated statistics M is no larger than 10% of the number of MCMC steps T. 
Such computational effort is enough to achieve the desired precision in formulating 
a proposal distribution. 


4 The ABC^ for p > 1 

Suppose p > 1 and let s D b s be the vector of the p observed statistics. In this case the 
theory for Lq(G ) is not well developed for finite samples; however, when p > 1, Lq(G) 
exists if and only if the matrix M[G) is symmetric. In the multi parameter case, we 
use the regular asymptotic argument for the likelihood (see e.g., Pace and Salvan, 1997, 
Ch. 4). That is, for n oo, the Taylor expansion of the QL around its mode leads to 
the following multivariate normal QL: 

L q {6) = E" 1 exp (-±(/(0) - s o6s ) t £^(/(0) - s ohs )) , 

where f{G) = E{ S | G) is a bounded monotone, and possibly non-linear regression 
function and £# is the conditional covariance matrix of S | G. Following the same 
approach as for p = 1, we use Lq(G) as a proposal distribution to be used in a MH 
scheme (see Algorithm 5) also considering a non constant covariance matrix £#(<?), that 
is 

q Q(e I G^) = N p (/(0 (t - 1 )), I J(P) I, (4) 

where N p (•, •) denotes the p-variate normal distribution with mean f(G l ' t ~ 1 ' > ) and vari¬ 
ance-covariance matrix £_r( 0^ -1 -*), and J(G ) is the Jacobian of the transformation 
/(#) = E(S | G). 

In the multi-parameter case we have a system of non-linear equations f(G) = s 
whose solution is G = / -1 (s). Such a solution and the calculation of the determinant of 
the Jacobian, \J(G)\, can be obtained using numerical methods for solving a non-linear 
system of equations and approximating the derivative of f(G) at G. In the case of non 
constant covariance matrix, we use the proposal distribution with covariance 

/ ojn (G) 0 0 

£fl(0)= 0 

V o o 

where <j 2 r1 (Q), ..., <J Rp (G) are the conditional variance functions for each component of 
s with respect to all p components of G. Note that in this case we are forced to use a 
diagonal covariance matrix in order to guarantee that £_r(<?) is positive definite. The 
correlation between the p parameters is then accounted for in the MCMC sampling. 
Algorithm 4 illustrates how to obtain estimates of f(G) and T,r(G) along with the 
calculation of the Jacobian corresponding to f(G) for p > 1. 
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Algorithm 4 Estimation of f(0), J{0) and T,r(0) for p > 1 

1 : Consider the set of to = 1,, M p points 9 m = (0\ m ,..., 9 pm ), each of p scalar 
coordinates over a regular lattice of 0i x ... x @ p and let 9 be the M p x p matrix 
of all points; 

2 : Generate s m ~ ir(S(y) | 9 m ) and let s be the M p x p matrix of all simulated 
statistics; 

3: for all j = l,...,p do Regress column j of s, Sj, on 9 m obtaining fj(9) and 
regression residuals e 7 . Calculate J(9) using Richardson’s extrapolation (using R 
package numDeriv). 

4: end for 

5: Let e = (ei,..., e p ) be the M p x p matrix of regression residuals, 

6 : if E# (6) = Yir constant then 

7: calculate T,r = M _1 e T e; 

8: end if 

9: if E r(6) is non constant then 

10 : regress log(e^) on 9 m to have a Rj ( 9 ) for j = 1.....p and obtain E r{6). 

11 : end if 

12 : return f(0) = (f 1 (9 1 ),..., f p {9 p )), J{9) and E R {0). 


Once we have estimated f(0) and E^ with f(0) and E#, respectively, we can cal¬ 
culate the proposal q®(-) in (4) and apply Algorithm 5. This is just Algorithm 2 in its 
multivariate version, where the distance function p : IR P —> 1R + must consider the joint 
distance of all p coordinates of s with respect to s 0 j, s . Also here Lq(9 ) can be used to 
fix e in two ways. The first solution, which is the one adopted in this paper, consists of 
considering a common e for all p dimensions by characterizing the stochastic norm of 
||s — s 0 j, s || and its quantiles. The second solution would be to consider different tolerance 
parameters, one for each of the p dimensions, by deriving the p marginals from the joint 
proposal and then, at each iteration t, updating each one of the p parameters separately. 

Finally, in order to speed up the MCMC algorithm, especially for large p, it is worth 
noting that once f(9) is estimated, its inverse and the Jacobian J{9) can be further 
interpolated by means of their respective values calculated on the points of 9 used for 
the pilot-run. With such interpolation, the inverse and Jacobians are calculated only 
on the points of the grid ( M p in total), which is much less computationally demanding 
than a calculation for all MCMC steps. 

Many of the remarks outlined in i)-v) hold in the multivariate case as well. In order 
to guarantee enough flexibility, and because we are mainly interested in predicting s, 
we consider f(0) and d- 2 R1 (9 )...., &r p (9) : to belong to the class of generalized additive 
regression models (Stone, 1985) in which each component of 9 enters into the linear 
predictor by means of a smoothing spline as discussed, for instance, in Section 12.2 of 
Faraway (2006). The Jacobian of the non linear system, J(9) which relates p summary 
statistics to the p parameters, is calculated using Richardson’s extrapolation (imple¬ 
mented in the R package numDeriv). Finally, the inverse of the non linear system of 
equations at some point s is obtained by Newton steps as detailed in Dennis and Schn- 
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Algorithm 5 The ABC g z for p > 1 

Require: /, J{0) and £jt(9), or their estimates (/, J{6) and £_r(0)). 
l: Set e > 0, 0 O = / _1 (s 0 6 S ); 

2 : for t = 1 to T do 
3: generate 

/* ~ N p (/( 0 (t-1) ), ■ 

4: set 9* = {0 : = 9} ; 

5: generate s ~ 7r(s (y) \ 9*)\ 

6: calculate p = p(s 0 f, s ,s); 

7: calculate the determinant of the Jacobian matrices and J{9*)\ 

8: with probability 


min 


n(9*)qQ{9^ | Q*) 
tt( 9^)qQ{9* | 9 {t ~V) P< 


accept 9* and set 9 l ' t ' ) = 9*, otherwise 9^ = 

9: end for 

10: return 9 I,1 ' > ,..., 9 (1 * 


abel (1996) (implemented in the R package nleqslv). 

We acknowledge that the approach proposed here limits the number of observed 
statistics to be equal to the number of unknown parameters. This is in line with the 
general recommendation to keep the number of statistics, i.e. the number of estimating 
functions, equal to the number of parameters, as also discussed in Mengersen et al. 
(2013) and Ruli et al. (2013). 


5 Examples 

In this section we illustrate the proposed approach with four examples. The first is a 
coalescent model (Tavare et al., 1997) with a scalar parameter of interest. The second 
example is a gamma model with two unknown parameters, and the third is the g-and-k 
distribution (see, e.g., McVinish, 2012) with four unknown parameters. The last example 
is an application to a real dataset concerning GWAS, with three unknown parameters. 
In the first example we apply Algorithms 2 and 3, whereas for the other examples we 
apply Algorithms 4 and 5. 

The coalescent model and four-parameter g-and-k distribution are considered as 
benchmark examples. The example of the gamma model is useful in order to validate 
the procedure against a known L^(9). Finally, the example of GWAS is relevant for 
analyzing population genetic isolates for which the genealogy tree is known. Although 
these kinds of data are rare, they are exceedingly more powerful for detecting genes 
related to some phenotypes than the usual and more costly GWAS applied to larger 
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samples of open populations. In all the examples we have used RW-ABC-MCMC, al¬ 
though in Examples 1 to 3 we have used also the independent MH ABC algorithm, 
obtaining similar results. This is because the proposal distribution, constructed under 
the framework of QL functions, is located around the bulk of the posterior distribution, 
7 T e (0 | s 0 bs)- Finally, the use of proper priors, in some examples, has the sole purpose 
of allowing comparison with existing methods. All examples, except the GWAS, have 
been implemented assuming that ctJj (9) and Er(0) are non constant. In the sequel 
ABC q i denotes the types of algorithms where the proposal for the MCMC has been 
approximated as explained above. 

5.1 Coalescent model 

In the following we consider an example from population genetics, namely the coalescent 
model analyzed in Tavare et al. (1997) and Blum and Frangois (2010), among others. 
Given a set of n DNA sequences, the aim is to estimate the effective mutation rate, 
9' > 0, under the infmitely-many-sites model. In this model, mutations occur at rate 
6' at DNA sites that have not been hit by mutation before. If a site is affected by a 
mutation, then it is said to be segregating in the sample. In this example, the summary 
statistic s' = y is the number of segregating sites. The generating mechanism for s' is 
the following: 

1 Generate T n , the length of the genealogical tree of the n sequences, where T n = 
yr_ 2 jWj , where Wj are independent Exponential random variables with mean 

2/j(j — 1) such that T n has mean [iT ri = 2 )Cj=i 1/j and variance 

4 n =4ES Vi 2 ; 

2 Generate (S' \ 0',T n ) ~ Poisson(9'T n / 2). 

Hence, the likelihood L^(9') is given by the marginal density of (S' \ 9') with respect 
to T n , which has a closed form only for n = 2 as T 2 ~ Exp(l/2). For large n, we 
approximate the inner integral in T n by simulating 10 5 Wj for j = 1,... ,n and then 
obtaining the marginal density of (S' \ 9') by averaging over these 10 5 simulated values 
of T n . This parametric approximation is denoted by 7 T ap (9' | s' obs ) and relies on the 
partial knowledge of the likelihood as the marginal density is obtained using the Poisson 
likelihood. 

In order to employ ABC g j, we consider 9 = log(d') and s = log(s' + 1) instead of 
9' and s ', respectively. As an example, we consider n = 100 and an observed value of 
Sobs = 2, which is a value likely to be obtained under 9 = 0 (and hence 9' = 1). Figure 
1 reports the result of the pilot-run study (top-left) with M = 1000 with /, a\(9) 
(top-right), the calculated Jacobian with the splines coefficients (bottom-left) and the 
approximated posterior (bottom-right) with e being the 10% quantile of the distribution 
of p(s, Sobs) simulated from 5|0, where 9 ~ Lq(9). 

From Figure 1 we can see that the chosen summary statistic is not informative for 
small values of 9 because observing no segregating sites with n = 100 samples may 
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Figure 1: Example: coalescent model. (Top-left) Realizations of s for the pilot-run study 
along with the estimated / and s 0 b s = 2 (green). (Top-right) The estimated conditional 
variance of s|0, a\(0). (Bottom-left) The Jacobian of f{9) in the grid of the pilot-run. 
(Bottom-right) The approximated posterior 7r(0|s o ;, s ) with 95% credible interval and 
posterior mean (green). 


occur for almost every mutation rate lower than e -5 . This is, of course, not a fault of 
the method, but of the chosen summary statistic and it may also occur in the standard 
original ABC approaches. Moreover, this is reflected by the proposed method as the 
estimated Jacobian is near 0 for values of 9' < e~ 5 . It can also be seen that for larger 
values, the approximated Jacobian is nearly constant, suggesting that there exists a 
linear relation between s and 9. 

For 7 t( 0) = Exp( 1) and n = 100 we calculated, for each dataset simulated at different 
values of 9 £ (2,3,..., 10), the relative difference of quantiles of each posterior with 
respect to those obtained with the parametric approximation, as in Blum and Frangois 
(2010). The relative difference is defined as (Q P — Qp)/Qp where Q p and Q ° are the p-th 
quantiles of the ABC posterior and that of the parametric approximation, respectively. 
Figure 2 shows the relative differences. We can clearly see that these differences are 
more robust with respect to 9 for the ABC g ; rather than for the ABC, and this is due 
to the impact of the prior in the standard ABC algorithm. In fact, for 9 —>• oo the 
discrepancy between prior and posterior becomes important. 


5.2 Gamma model with unknown shape and scale parameters 

Let y ~ Gamma{9i, 02) with mean exp(0i — $ 2 ) and variance exp(0i —292), where 9\ and 
1/02 are the log of shape and scale, respectively. We have p = 2, 6 £ 0 = ©i x 0 2 = IR 2 
and consider the following two statistics s = (si, S 2 ), where si and S 2 are the logarithms 
of the sample mean and the standard deviation, respectively. For 9 = (0, 0) we consider a 
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Figure 2: Example: coalescent model. Comparison of ABC and ABC g j with the para¬ 
metric approximation in terms of relative differences between quantiles of the parametric 
approximation and those of ABC and ABC g ;. 


sample of size n = 10 with s 0 b s = (—0.12. —0.26) and estimate the posterior distribution 
under two independent standard normal priors, tt(0) = 7r(6b) X7 t(02 ) = N(0, 1) x N(0, 1). 
We consider the estimation of f(9) with M = 100, p = 2 over a regular lattice of 
M 2 = 10 4 points in (—2, 2) x (—2, 2) £ ©i x © 2 - 

Figure 3 shows the conditional regression functions of each statistic against the 
parameters, which appear quite linear. Figure 4 illustrates the log of squared residuals 
and the estimation of the conditional variances d’^ 1 (0) and <rfj 2 (0) with respect to 9. 
Figures 5 and 6 illustrate the MCMC output. From the former we can deduce that the 
chain mixes well, and that marginal posteriors 7 t £ ((9i | s D t, s ) and 7r e (02 | Sobs ) are centered 
around the true values. From the latter, we see that the bivariate density n e {6 \ s 0 b s ) 
obtained by the ABC g q and the true underlying posterior \ y) are similar in terms 

of contour levels. 

We notice that the output of the MCMC in Figure 6 is similar to that obtained 
using E/j constant (not reported here). This is presumably because, in this example, 
the use of logarithms on the scale of the summary statistics stabilizes their conditional 
variance with respect to 6. Another reason is because the chain moves in a radius of 
Sobs < e where the variance functions a 2 n(9), bfj 2 (0) are almost constant and do not 
differ significantly from the estimated diagonal of E/j when it is assumed constant. 


5.3 Four-parameter g-and-fc distribution 

Distributions based on quantiles are of great interest because of their flexibility. However, 
although a stochastic representation is available, their density and hence the likelihood 
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Figure 3: Example: gamma model. Conditional regression functions of each statistic 
against the two parameters. The green point is the observed value. 



9i 


Figure 4: Example: gamma model. Conditional variance functions of each statistic 
against the two parameters. 


Ln(9) are not available in closed form and, in general, they are difficult to evaluate. 

We focus on the four-parameter (p = 4), g-and-k distribution, which has the follow¬ 
ing stochastic representation 

* ~ N{ 0,1) 


(■y I z, 01, 02, 03 , 9a) 
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Figure 5: Example: gamma model. Marginal output of Algorithm 5, for 0\ and 02 along 
with the histogram of the marginal posteriors 7r e (0i | s 0 bs) and Tr e (02 | s 0 bs)- Vertical 
dotted lines are the true values of 0\ and 02 that are used to generate s 0 bs■ 



- 2-10 1 2 

Oi 


Figure 6: Example: gamma model. Contours for TT e (0 | s 0 f, s ) (dashed) along with 7T/v(0 | 
y) (continuous). The posterior modes of 0i and 0 2 are represented by the cross point of 
the dashed lines, where the true value is 9 — (0, 0). 
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Figure 7: Example: g-and-k distribution. Conditional distribution of Si\9i, for i = 
1,..., 4. Each boxplot represents the marginal distribution of the statistic at the speci¬ 
fied value of 9i in the horizontal axis. 


The unknown parameters 0\, exp ($ 2 ), $3 and exp ($ 4 ) — 1/2 represent location, scale, 
skewness and kurtosis, respectively (Haynes et ah, 1997). 

Such distributions have also been used for testing several ABC approaches as in 
Marjoram et al. (2003); McVinish (2012). 

We consider the following four statistics all based on empirical quantiles q of y q : 

( , , x 2 / 0.75 + 2 / 0.25 ~ 22/0.5 , / A _ to 4 

s = I 2/0.5 , log(yo .75 — 2/0.25), - , log (2/0.975 — 2/0.025) S 1R , 

V 2 / 0.75 — 2 / 0.25 ) 

with the following meaning: si is the median, si is the log of the interquartile range and 
S 3 is the skewness index described in Bowley (1937), a special case of the Hinkley (1975) 
index. Finally S 4 is the log transformation of the kurtosis index described in Crow and 
Siddiqui (1967). 

From the pilot-run simulation shown in Figure 7, it is possible to notice that the 
relationship between statistics and parameters is not linear. In this example, we con¬ 
sider a sample simulated under the scenario discussed in Fearnhead and Prangle (2012) 
in which n = 10 4 observations are generated with 9 = (3,0, 2, — log(2)) and the uni¬ 
form prior on [ 0 , 10 ] x [—log( 10 ),log( 10 )] x [ 0 , 10 ] x [— log( 10 ),log( 10 )] is considered. 
Figure 8 reports the conditional distributions of the logarithm of squared residuals, in¬ 
dicating that the variance may be non constant. Therefore, for this model, we estimated 
with GAMs the conditional variance functions d‘p i:) (0), for j = 1,... ,4 as explained in 
Algorithm 4. 

Figure 9 shows the output of the ABC q z with the RW-ABC-MCMC algorithm for 
the four marginal posterior densities, for a given sample. We can see that the marginal 
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Figure 8: Example: g-and-k distribution. Logarithms of squared residuals in the pilot- 
run. Each boxplot is the conditional distribution of log(e^)| 9j, j = 1, 2, 3,4. Such values 
indicate that 'Sr(O) is not constant with respect to 9. 


posterior distributions include, in their high posterior density interval, the true value of 
9 and the chains have good mixing. 

Finally, the performance of the proposed method is assessed by a simulation study of 
50 simulated datasets. As in Fearnhead and Prangle (2012), the 50 datasets are simu¬ 
lated from 50 different parameter values sampled from the prior. The expected quadratic 
errors for each marginal ABC q z posterior are reported in Figure 10, where also the Mean 
Squared Error (MSE) of the Semi-Automatic ABC, taken from Fearnhead and Prangle 
(2012), is shown for comparison of the order of magnitudes. 

From Figure 10, we can see that the expected quadratic error under ABC q z is 
compatible with that reported for the Semi-Automatic ABC. Notice that our method 
uses a set of four observable summary statistics that differ from those ones used in 
Fearnhead and Prangle (2012). 

5.4 GWAS for isolated populations with known genealogy 

In this application, we address the problem of estimating DNA markers related to 
a certain phenotype such as, for instance, the presence of a certain disease. In this 
problem genotype is represented by a large set of DNA sequences known as Single- 
Nucleotide Polymorphisms or SNPs in the sequel. Such SNPs are usually observed in 
millions per individuals and thus fast and reliable statistical methods are needed in order 
to answer the scientific question as to which SNPs are mainly related to the disease. 
A dataset for a case/control study is usually collected on an open population where the 
degree of inbreeding, that is the mating of pairs who are closely related genetically, is 
unknown as it is usually negligible in open populations. Nonetheless, there exists certain 
evidence in the genetic literature that very valuable indications regarding SNPs/Disease 
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Figure 9: Example: g-and-k distribution. The approximated marginal posterior n e (9i \ 
Sobs): ^ ~ 1; 2, 3, 4. 
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Figure 10: Example: g-and-k distribution. The marginal distributions of the Log of 
MSE for each of the four parameters, in 50 replications of the considered simulation 
scenario, along with the Log of MSE of the Semi-Automatic ABC procedure reported 
in Fearnhead and Prangle (2012) (stars). 


relationships may come from the study of isolate genetics, i.e. human samples for which 
inbreeding is also relevant and known. Such types of collected samples are very rare 
because there are very few genetic isolates in the world, and so the statistical methods 
to analyze them are not very well developed. One example of a genetic isolate for which 
data are also available is the Sardinian genetic isolate of the Ogliastra region, which is 
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situated in the center of the island of Sardinia (Cabras et ah, 2011). 

An example of such data may come from Figure 11 in which we have a population 
composed of 4 families, 18 individuals and 2 SNPs labeled as SNP1 and SNP2. From 
Figure 11 we have that ancestors have not been observed because have died (white); 
while offsprings are labelled as healthy (green) or affected (red) along with their SNP 
configurations. 

Data in Figure 11 may be formally represented as follows: for individual i let 
Yi £ {0,1} represent the indicator of the phenotype, i.e. Yi = 1 if affected and F) = 0 
otherwise; assume X, £ {{aA, Aa}, aa, AA} represents the genotype, e.g. the SNP con¬ 
figuration with three levels. For a genealogy composed of N individuals we have to model 
the corresponding pairs (Yi, Ad),..., (Yv, Aat), where only n < N have been observed. 
At the phenotype level we assume the usual logit model (Y)|Aj, 6) ~ Bernoulli^), with 
Pi being the probability that individual i is affected. 

If one considers the data in Figure 11 as n = 10 independent observations and 
estimates, for instance, a logistic regression model of Y against A, or just considers the 
Fisher exact test among Y and A, one would end up finding no association. In particular, 
the Fisher exact test between Y and the first SNP has a p- value of 0.21 while with SNP2 
it is exactly 1. While the latter p- value is reasonable as there is apparently no association 
between SNP2 and Y, the former is not, because all aa individuals are affected and all 
AA individuals are healthy and therefore there should be certain evidence of association 
between the first SNP and affection status. The shortcoming of this analysis is that it 
treats all individuals as independent and identically distributed, while it is clear that 
they are not. 

On the other hand, the situation of highly dependent observations complicates the 
statistical model. In fact, the probabilistic model for the sample must take into ac¬ 
count the genealogy that underlies the genetic variant transmission and also the model 
which relates the phenotype to the genotype. As the genotype is observed for the very 
last generations only, the configurations of the SNPs for the previous generations then 
constitute an enormous number of random latent variables. This makes it almost im¬ 
possible to write the likelihood for the parameter relating the SNPs configuration with 
the phenotype, that is, the coefficients of a logistic regression between Y and A. 

In this application we illustrate the ABC g ; method for the data in Figure 11 and 
we also consider a sample from the village of Talana (Ogliastra, Sardinia - Italy) that 
is affected by a reduced Mean Cell Volume (i.e. a MCV<72) disease for which it is 
known that there exists a genetic variant inside the Beta-Globin gene which determines 
it. The data, provided by the Centro Nazionale Ricerche of Italy, consists of N = 
1997 individuals, all in one tree, originating from two common ancestors. Only n = 49 
individuals of the later generations are observed, among whom only 5 are affected. 
Moreover, the proportion of affected, similar to the prevalence of the disease, is around 
13%. There are 91 SNPs with three levels and we know that only one is inside the 
Beta-Globin gene. 

In this analysis, we treat each SNP separately and set up a stochastic model for 
a single SNP. The overall analysis for all SNPs is made by the sequence of separated 
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Figure 11: Example of a genealogy tree with 4 families, 18 individuals and 2 SNPs 
(SNP1,SNP2). Ancestors have not been observed (white); while the offspring are labelled 
as healthy (green) or affected (red). 


analyses over all SNPs. In the linear predictor of the logistic regression, we consider as 
covariates the genotype for individual i: 

logit(pi) = OiI Xi ={aA,Aa} + ^I{ Xi=aa } + ^3^{Xi=AA}- 

The vector of coefficients 8 are usually interpreted as the log of the odds ratio for 
the probabilities of being affected given a SNP configuration. In order to account for 
the fact that the sample (yi,xi ),..., (y n ,x n ) is not i.i.d. we include the transmission 
model for the genetic variants. Specifically, let Xi 1 and Xi 2 be the SNP configuration 
for the ancestors of individual i. Then the probabilistic model for the transmission of a 
genotype variant is assumed to be regulated by the usual Mendelian inheritance model 
of transmission, where the ancestor individuals are assumed to be known, according 
to the genealogical tree. This law is also known as the law of independent assortment, 
segregation or dominance, see for instance Levitan (1988). Therefore, if individual i is 
a descendent in the tree 


(Xi\X^,Xi 2 ) ~ Mendel’s law, 

while if i is a founder or her/his ancestors are not in the tree, then we assume the 
following prior distribution for configuration of ancestors: 

Xi ~ Trinomial(l/3,1/3,1/3). 

The summary statistics, calculated only for the n observations, are the observed 
log-odds of the proportion of affected among all individuals that have a certain config¬ 
uration. Specifically, let ^{w} count the number of occurrences of type w, 

l + #{F=l|X = fc} \ 

2 + #{X = k} )' 


Sk = log 
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Figure 12: Pilot-run for the genealogy tree in Figure 11 with M = 30 3 points. Each 
boxplot is the conditional distribution of Sk\ 8 k, k = 1, 2, 3. 


where k = 1,2,3 corresponds to X = aa , X = { aA , Aa} and X = AA, respectively. Note 
that we add one individual in the numerator and two in the denominator in order to 
guarantee that si, S 2 and S 3 are always defined. This, of course, constitutes a limitation 
for very small samples from which, however, it would be difficult to estimate very strong 
signals for a SNP that is a risk or protection factor. 

In order to run ABC g ;, we performed a pilot-run simulation with M = 30 3 points on 
a regular grid of log odds ratios between -10 and 10. This pilot-run study depends only 
on the genealogy tree and not on the observed genotypes or phenotypes. In the case of 
GWAS analysis, this provides a saving in computational efforts, as in general there are 
many genes to be analysed. 

The results for the pilot-run study are summarized in Figure 12 where we can see that 
the chosen statistics are quite informative around the null hypothesis of no association. 
For very large signals, e.g. |0fc| > 3, the summary statistics are very weakly informative. 
This is not the fault of the summary statistics, but it is due to the small observed 
sample, as is typical in genetic isolates. 

Figure 13 illustrates the conditional distributions of the logarithm of squared resid¬ 
uals with respect to 8 1 , 82 and 83 , which could be used to estimate the variance function 
E^(0). However, in this case we find it reasonable to assume that (G) is constant 
and we estimate it with Y,r as explained in Algorithm 4. 

We complete the algorithm by using the Euclidean distance p(s, s 0 f, s ), between s and 
its observed value s 0 b s further weighted by a term that takes into account the simulated 
configurations for the SNP. This term makes the distance tend to 0 when there are 
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Figure 13: Logarithms of squared residuals in the pilot-run for the genealogy tree in 
Figure 11. Each boxplot is the conditional distribution of log(e|)|0fc, k = 1,2,3. Such 
values indicate that T,r(0) is constant with respect to 9. 


many matches between simulated and observed configurations, 



n 

The tolerance parameter e has been fixed in order to obtain an acceptance probability 
in the RW-ABC-MCMC algorithm around 30%. The output of the chain for data in 
Figure 11 is represented in Figure 14. 

From Figure 14 we can see that SNP1, which has the largest signal, exhibits val¬ 
ues of 9 with the largest posterior mean and the largest uncertainty. Moreover, the 
approximated marginal posteriors for 9\ and $3 for SNP1 are very skewed. For SNP2 
where there is no signal, posterior distributions are centered around 0. These results 
are also reflected by the logarithm of the Bayes Factors (BFs) for (0 > 0 | s 0 j, s ) against 
(0 < 0 | s 0 b s ), Pr(0 > 0 | s o 6 s )/Pr (0 < 0 | s 0 bs) which is defined as long as the posterior 
of 9 | s 0 bs exists. In fact, there is substantial evidence for the configurations of SNP1 to 
be risk or protective factors, but not for SNP2. 

We repeated the above analysis for the Talana data and found that the SNP inside 
the Beta-Globin (rsll036238) is among the first three SNPs, out of 91, with the highest 
posterior mean (in absolute value) as shown in Figure 15. Those SNPs also have the 
largest Bayes Factors. However, the greater uncertainty for the first three SNPs in 
Figure 15 is due to the fact that with only n = 49 observed individuals we cannot be 
very precise in estimating very large signals as discussed above. 
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(a) RW-ABC-MCMC for SNP1 


(b) RW-ABC-MCMC for SNP2 



Figure 14: Output of the analysis for data in Figure 11. Chain output for RW-ABC- 
MCMC for the two SNPs (a,b) and corresponding density estimation (c) with posterior 
means represented by dots. Logarithm of Bayes Factor for (0 > 0 | s 0 b s ) against (0 < 
0 | s 0 b s ) along with the reference lines at ±1.6 (d). 


rsl 1036238 rs61747674 rs904251 rs2363956 rs6533125 rs6999234 rs4298170 rs6702384 rs2153157 rs261950 


Figure 15: For data from the village of Talana we report, for 40 different SNPs, 95% 
credible intervals for those 0s with the largest posterior mean (in absolute) value (dot). 
The posterior is approximated with the RW-ABC-MCMC. 


6 Conclusions 

Recently, the idea of using simulation from the model to approximate the distribution 
of summary statistics seems to be proliferating in the ABC literature (see, e.g. Prangle 
et al. (2013b); Wood (2010); Ratmann et al. (2014)). In this paper we also used this type 


























































Stefano Cabras, Maria Eugenia Castellanos, and Erlis Ruli 


435 


of approach. In particular, we simulate from the model varying the parameters in a grid 
to approximate the distribution of summary statistics as a function of the parameters 
in order to build a suitable proposal for ABC-MCMC. Such a proposal distribution can 
be implemented in a RW fashion or can be used as an independence kernel, although 
we focused mainly on RW type MCMC algorithms. 

In scalar parameter problems with conditional constant variance of summary statis¬ 
tics with respect to the parameters, we showed by using the definition of quasi-likelihood 
(McCullagh, 1991) that this proposal is a normal kernel in the auxiliary space, f(8). 
In multiparameter problems or when the variance of the regression function cannot be 
assumed to be constant, the theory of quasi-likelihoods only suggests a form for the 
proposal. In fact, analogously to the scalar parameter case with constant variance, we 
propose using a multivariate normal kernel in the auxiliary space. 

A key point for the success of our method is that the summary statistics must vary 
when changing the parameter values. Moreover, there must be a one-to-one relation 
between them and again, the choice of s is critical, as an ancillary statistic is useless for 
gathering information about 6 . Such a non ancillarity assumption is usually required 
over the whole parameter space 0 and it may happen that there could be parts of the 
parameter space where s is locally ancillary. This happens, for instance, in the coalescent 
model for low mutation rates and also in the application to GWAS. Such ancillarity, 
however, is properly accounted for in the discussed proposal density for ABC-MCMC. 
Another problem with our approach may lie in the asymptotic argument for p > 1 which 
may not hold in some applications when Lq(0) is irregular. 

The proposed ABC g ; seems to perform quite well in the above examples compared to 
other available methods; it is straightforward to apply and its implementation does not 
require more than just basic notions of regression analysis. We discussed two possible 
ABC-MCMC algorithms, the RW-MH with or without constant regression variance. 
Although we focus mainly on the non constant variance assumption, we found that 
the original independent MH with constant regression variance leads to satisfactory 
results for the discussed examples. This is because we are only implementing a proposal 
distribution for the ABC-MCMC and also because the estimation theory of QL functions 
allows for a good approximation of L^{6) which is reflected in the proposal. 

Furthermore, when simulation from ir(y \ Q ) is costly another alternative to ABC 
could be using the Lq(Q ) as a surrogate of Ln(Q) as in Cabras et al. (2014). For other 
surrogate pseudo-likelihoods used in the ABC context, see also Mengersen et al. (2013); 
Pauli et al. (2011) among others. 
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